|
In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. Typically the standard deviations of residuals in a sample vary greatly from one data point to another even when the errors all have the same standard deviation, particularly in regression analysis; thus it does not make sense to compare residuals at different data points without first studentizing. It is a form of a Student's t-statistic, with the estimate of error varying between points. This is an important technique in the detection of outliers. It is among several named in honor of William Sealey Gosset, who wrote under the pseudonym ''Student'', and dividing by an ''estimate'' of scale is called studentizing, in analogy with standardizing and normalizing ==Motivation== The key reason for studentizing is that, in regression analysis of a multivariate distribution, the variances of the ''residuals'' at different input variable values may differ, even if the variances of the ''errors'' at these different input variable values are equal. The issue is the difference between errors and residuals in statistics, particularly the behavior of residuals in regressions. Consider the simple linear regression model : Given a random sample (''X''''i'', ''Y''''i''), ''i'' = 1, ..., ''n'', each pair (''X''''i'', ''Y''''i'') satisfies : where the ''errors'' ''ε''''i'', are independent and all have the same variance ''σ''2. The residuals are not the true, and unobservable, errors, but rather are ''estimates'', based on the observable data, of the errors. When the method of least squares is used to estimate ''α''0 and α1, then the residuals , unlike the errors , cannot be independent since they satisfy the two constraints : and : (Here ''ε''''i'' is the ''i''th error, and is the ''i''th residual.) Moreover, and most importantly, the residuals, unlike the errors, ''do not all have the same variance:'' the variance decreases as the corresponding ''x''-value gets farther from the average ''x''-value. This is a feature of the regression better fitting values at the ends of the domain, not the data itself, and is also reflected in the influence functions of various data points on the regression coefficients: endpoints have more influence. This can also be seen because the residuals at endpoints depend greatly on the slope of a fitted line, while the residuals at the middle are relatively insensitive to the slope. The fact that ''the variances of the residuals differ,'' even though ''the variances of the true errors are all equal'' to each other, is the ''principal reason'' for the need for studentization. It is not simply a matter of the population parameters (mean and standard deviation) being unknown – it is that ''regressions'' yield ''different residual distributions'' at ''different data points,'' unlike ''point estimators'' of univariate distributions, which share a ''common distribution'' for residuals. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「studentized residual」の詳細全文を読む スポンサード リンク
|